{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supercharging Web Crawling with Crawl4AI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installation\n",
    "\n",
    "To get started with the `crawl4ai` integration in AG2, follow these steps:\n",
    "\n",
    "1. Install AG2 with the `crawl4ai` extra:\n",
    "   ```bash\n",
    "   pip install -U ag2[openai,crawl4ai]\n",
    "   ```\n",
    "   > **Note:** If you have been using `autogen` or `ag2`, all you need to do is upgrade it using:  \n",
    "   > ```bash\n",
    "   > pip install -U autogen[openai,crawl4ai]\n",
    "   > ```\n",
    "   > or  \n",
    "   > ```bash\n",
    "   > pip install -U ag2[openai,crawl4ai]\n",
    "   > ```\n",
    "   > as `autogen`, and `ag2` are aliases for the same PyPI package.  \n",
    "2. Set up Playwright:\n",
    "   \n",
    "   ```bash\n",
    "   # Installs Playwright and browsers for all OS\n",
    "   playwright install\n",
    "   # Additional command, mandatory for Linux only\n",
    "   playwright install-deps\n",
    "   ```\n",
    "\n",
    "3. For running the code in Jupyter, use `nest_asyncio` to allow nested event loops.\n",
    "    ```bash\n",
    "    pip install nest_asyncio\n",
    "    ```\n",
    "\n",
    "\n",
    "You're all set! Now you can start using browsing features in AG2.\n",
    "\n",
    "\n",
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import nest_asyncio\n",
    "from pydantic import BaseModel\n",
    "\n",
    "from autogen import AssistantAgent, UserProxyAgent\n",
    "from autogen.tools.experimental import Crawl4AITool\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LLM-Free Crawl4AI\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config_list = [{\"api_type\": \"openai\", \"model\": \"gpt-5-nano\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}]\n",
    "\n",
    "llm_config = {\n",
    "    \"config_list\": config_list,\n",
    "}\n",
    "\n",
    "user_proxy = UserProxyAgent(name=\"user_proxy\", human_input_mode=\"NEVER\")\n",
    "assistant = AssistantAgent(name=\"assistant\", llm_config=llm_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "crawlai_tool = Crawl4AITool()\n",
    "\n",
    "crawlai_tool.register_for_execution(user_proxy)\n",
    "crawlai_tool.register_for_llm(assistant)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = user_proxy.initiate_chat(\n",
    "    recipient=assistant,\n",
    "    message=\"Get info from https://docs.ag2.ai/docs/Home\",\n",
    "    max_turns=2,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Crawl4AI with LLM\n",
    "\n",
    "> **Note:** [`Crawl4AI`](https://github.com/unclecode/crawl4ai) is built on top of [LiteLLM](https://github.com/BerriAI/litellm) and supports the same models as LiteLLM.\n",
    ">\n",
    "> We had great experience with `OpenAI`, `Anthropic`, `Gemini` and `Ollama`. However, as of this writing, `DeepSeek` is encountering some issues.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config_list = [{\"api_type\": \"openai\", \"model\": \"gpt-5-nano\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}]\n",
    "\n",
    "llm_config = {\n",
    "    \"config_list\": config_list,\n",
    "}\n",
    "\n",
    "user_proxy = UserProxyAgent(name=\"user_proxy\", human_input_mode=\"NEVER\")\n",
    "assistant = AssistantAgent(name=\"assistant\", llm_config=llm_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set llm_config to Crawl4AITool\n",
    "crawlai_tool = Crawl4AITool(llm_config=llm_config)\n",
    "\n",
    "crawlai_tool.register_for_execution(user_proxy)\n",
    "crawlai_tool.register_for_llm(assistant)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = user_proxy.initiate_chat(\n",
    "    recipient=assistant,\n",
    "    message=\"Get info from https://docs.ag2.ai/docs/Home\",\n",
    "    max_turns=2,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Crawl4AI with LLM & Schema for Structured Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config_list = [{\"api_type\": \"openai\", \"model\": \"gpt-5-nano\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}]\n",
    "\n",
    "llm_config = {\n",
    "    \"config_list\": config_list,\n",
    "}\n",
    "\n",
    "user_proxy = UserProxyAgent(name=\"user_proxy\", human_input_mode=\"NEVER\")\n",
    "assistant = AssistantAgent(name=\"assistant\", llm_config=llm_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Blog(BaseModel):\n",
    "    title: str\n",
    "    url: str\n",
    "\n",
    "\n",
    "# Set llm_config and extraction_model to Crawl4AITool\n",
    "crawlai_tool = Crawl4AITool(llm_config=llm_config, extraction_model=Blog)\n",
    "\n",
    "crawlai_tool.register_for_execution(user_proxy)\n",
    "crawlai_tool.register_for_llm(assistant)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "message = \"Extract all blog posts from https://docs.ag2.ai/blog\"\n",
    "result = user_proxy.initiate_chat(\n",
    "    recipient=assistant,\n",
    "    message=message,\n",
    "    max_turns=2,\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Supercharging Web Crawling with Crawl4AI",
   "tags": [
    "tools",
    "browser-use",
    "webscraping",
    "function calling"
   ]
  },
  "kernelspec": {
   "display_name": ".venv-crawl4ai",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
